Efficient maximum common subgraph (MCS) searching of large chemical databases
نویسندگان
چکیده
Despite dramatic improvements in the hardware resources and computational power available to pharmaceutical researchers over the past few decades, the methods used for assessing the 2D chemical similarity between two molecules hasn’t changed much since the 1960s. Here we report a novel chemical database search method that allows the exact size of the maximum common edge subgraph (MCES) between a query molecule and molecules in a database to be calculated rapidly. Using a pre-computed index, the 50 nearest neighbors of a query can be determined in a few seconds, even for databases containing millions of compounds. This work builds upon the previous efforts of Wipke and Rogers in the 1980s [1] and of Messmer and Bunke in the 1990s [2], harnessing the advances in high-performance computing and storage technology now available. A graphical depiction of such a “SmallWorld” index is shown below.
منابع مشابه
Chemical similarity searching using a neural graph matcher
A neural graph matcher based on Correlation Matrix Memories is evaluated in terms of efficiency and effectiveness against two maximum common subgraph (mcs) algorithms. The algorithm removes implausible solutions below a user-defined threshold and runs faster than conventional mcs methods on our database of chemical graphs while being slightly less effective.
متن کاملSmall Molecule Subgraph Detector (SMSD) toolkit
BACKGROUND Finding one small molecule (query) in a large target library is a challenging task in computational chemistry. Although several heuristic approaches are available using fragment-based chemical similarity searches, they fail to identify exact atom-bond equivalence between the query and target molecules and thus cannot be applied to complex chemical similarity searches, such as searchi...
متن کاملMaximum common subgraph isomorphism algorithms for the matching of chemical structures
The maximum common subgraph (MCS) problem has become increasingly important in those aspects of chemoinformatics that involve the matching of 2D or 3D chemical structures. This paper provides a classification and a review of the many MCS algorithms, both exact and approximate, that have been described in the literature, and makes recommendations regarding their applicability to typical chemoinf...
متن کاملMaking the most of approximate maximum common substructure search
The maximum common substructure (MCS) problem is of great importance in multiple aspects of chemoinformatics. It has diverse applications ranging from lead prediction to automated reaction mapping and visual alignment of similar compounds. Many different algorithms have been developed [1], both exact and approximate. Since the MCS problem is NP-complete, the strict time constraints of most appl...
متن کاملMaximum Common Substructure-Based Data Fusion in Similarity Searching
Data fusion has been shown to work very well when applied to fingerprint-based similarity searching, yet little is known of its application to maximum common substructure (MCS)-based similarity searching. Two similarity search applications of the MCS will be focused on here. Typically, the number of bonds in the MCS, as well as the bonds in the two molecules being compared, are used in a simila...
متن کامل